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Report  Title 

Multimodal  Signal  Processing  for  Personnel  Deteetion  and  Activity  Classification  for  Indoor  Surveillance 

ABSTRACT 

This  goal  of  this  project  was  to  develop  novel  schemes  for  the  fusion  of  heterogeneous  information.  The  target  application  was  the  detection 
and  classification  of  personnel  activity  in  both  indoor  and  outdoor  environments  under  dependent  observations.  We  have  identified  features 
and  designed  a  classifier  that  achieves  up  to  95%  classification  accuracy  on  classifying  the  occupancy  with  indoor  footstep  data.  MDL-based 
copula  selection  strategies  are  investigated  and  a  detector  based  on  vines  is  designed  that  extends  previous  bivariate  copula-based  detectors 
to  a  multi-sensor  application.  Our  copula-based  detectors  yield  more  than  40%  improvement  over  the  conventional  data  fusion 
methodologies.  We  extend  our  solution  to  quantized  sensor  information  and  demonstrate  that  injecting  controlled  noise  can  dramatically 
reduce  computational  complexity  with  insignificant  performance  loss.  We  propose  and  derive  the  Conditional  Posterior  Cramer-Rao  Lower 
Bound  (CPCRLB)  for  online  tracking.  We  demonstrate  that  the  PCRLB-based  iterative  approach  converges  quickly  with  significantly 
reduced  computational  cost  as  compared  to  a  one-shot  approach.  Detector  design  in  the  presence  of  security  threats,  such  as  data  falsification 
attacks  to  sensor  networks,  are  also  addressed.  Error-control  codes  and  decoding  algorithms  are  used  to  reliably  classify  data  in  a  network 
containing  human  agents. 
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Statement  of  the  problem  studied 

To  develop  algorithms  for  distributed  inference  applications,  such  as  detection,  classification,  and  target 
localization,  for  data  obtained  from  heterogeneous  sensors  (including  human  sources). 

Summary  of  most  important  results 

1.  Data  collection. 

A  data  collection  effort  at  Syracuse  University  has  resulted  in  a  signal  database  generated  from  human 
indoor  activity  monitored  by  heterogeneous  (multi-modal)  sensors.  The  sensor  suite  comprised  seismic, 
acoustic  and  video  modalities.  The  experiments  conducted  under  this  setup  were  focused  on  a 
controlled  scenario  to  enable  a  better  understanding  of  signal  behavior  under  excitation  from  human 
source.  The  environment  was  representative  of  a  typical  office  setup  consisting  of  a  hallway  with 
adjoining  rooms.  These  experiments  have  led  to  the  design  of  personnel  detection  and  occupancy 
classification  strategies  (See  items  3  and  4).  Along  with  this  indoor  data,  we  have  also  collaborated  with 
the  US  Army  Research  Laboratory  and  have  applied  our  algorithms  to  an  outdoor  data-set  collected  at 
the  US  southwest  border. 

2.  Copula-based  dependence  characterization  for  inference  with  heterogeneous  sensor  data. 

We  have  used  copulas  to  characterize  the  dependence  and  joint  distribution  between  different  sensor 
observations.  Copula  theory  allows  us  to  specify  a  multivariate  distribution  by  coupling  disparate 
marginal  distributions  of  individual  sensor  observations.  In  other  words,  an  explicit  functional 
relationship  between  the  joint  distribution  and  its  corresponding  marginal  distributions  is  specified  and 
is  called  the  copula  function.  Several  copulas  are  defined  in  the  literature,  and  therefore,  copula 
selection  is  an  important  part  of  the  inference  problem. 

2.1  Contributions  to  theory 

We  developed  relevant  theory  for  copula-based  inference. 

(a)  Copula-based  detection/classification.  We  have  addressed  the  copula  selection  issue  for  the 
detection  problem  using  approaches  based  on  area  under  the  ROC  and  average  loss  of  detection 
power  [1,  2].  Asymptotic  performance  loss  due  to  misspecification  of  the  copula  function  in 
terms  of  error  exponents  is  also  quantified. 

(b)  Multivariate  dependence.  We  have  developed  a  detection  scheme  that  considers  multi-sensor 
dependence  using  a  copula-based  approach  [3].  Past  applications  using  the  copula-based 
approach  have  mostly  been  limited  to  the  bivariate  (2  sensor)  case.  Our  detector  is 
asymptotically  optimal  in  the  Neyman-Pearson  sense.  A  multivariate  copula  is  constructed 
based  on  the  theory  of  vines.  A  novel  tail  dependence  based  node  ordering  scheme  is  proposed. 
Copula  selection  is  performed  using  various  minimum  description  length  (MDL)  criteria.  We 
have  shown  that  accounting  for  multivariate  dependence,  along  with  the  node  ordering,  leads 
to  significant  improvement  over  a  bivariate  approach. 


(c)  Distributed  detection  and  detection  with  dependent  quantized  data.  Sensor  observations  (or 
features  extracted  thereof)  are  most  often  quantized  before  their  transmission  to  the  fusion 
center  for  bandwidth  and  power  conservation  [1,  4],  A  detection  scheme  is  proposed  for  this 
problem  assuming  uniform  scalar  quantizers  at  each  sensor.  The  designed  rule  is  applicable  for 
both  binary  and  multibit  local  sensor  decisions.  An  alternative  suboptimal  but  computationally 
efficient  fusion  rule  is  also  designed  which  involves  injecting  a  deliberate  disturbance  to  the 
local  sensor  decisions  before  fusion.  The  rule  is  based  on  Widrow's  statistical  theory  of 
quantization.  Addition  of  controlled  noise  helps  to  linearize  the  highly  nonlinear  quantization 
process  thus  resulting  in  computational  savings.  It  is  shown  that  although  the  introduction  of 
external  noise  does  cause  a  reduction  in  the  received  signal  to  noise  ratio,  the  proposed 
approach  can  be  highly  accurate  when  the  input  signals  have  bandlimited  characteristic 
functions,  and  the  number  of  quantization  levels  is  large.  The  proposed  fusion  rule  reduces 
computational  complexity  from  0(2")  to  0{N\ogN). 

(d)  Nonstationary  dependence.  The  problem  of  detection  for  dependent,  non-stationary  signals 
where  the  non-stationarity  is  encoded  in  the  dependence  structure  is  considered  in  [5].  A 
sample-wise  copula  selection  scheme  for  a  simple  hypothesis  test  is  proposed.  We  prove  that 
the  proposed  scheme  performs  better  than  previously  used  single  copula  selection  schemes. 

2.2  Applications 

The  developed  theories  were  applied  to  the  following  applications. 

(a)  Footstep  detection  for  indoor  seismic/acoustic  data.  A  generalized  likelihood  ratio  test  (GLRT) 
based  approach  is  applied  to  the  seismic-acoustic  fusion  problem  for  footstep  detection  [2]. 
Spectral  features  (STFT)  are  calculated  locally,  at  the  sensor  level.  The  fusion  center  uses  a 
canonical  correlation  analysis  (CCA)  to  transform  the  spectral  signatures  so  that  the  features  are 
maximally  correlated.  Copulas  are  used  to  characterize  the  nonlinear  dependence  and  a  GLRT 
decides  the  presence  or  absence  of  a  footstep  signal.  The  maximization  of  the  likelihood 
function  is  done  over  both  the  parameter  space  of  a  given  copula,  as  well  as  the  family  of 
copulas  considered  in  the  library.  The  copula  selection  is  thus  contained  within  the  GLRT  and  is 
performed  online.  The  generalized  Gaussian  distribution  is  used  to  model  marginal  statistics. 
The  detector  is  tested  against  normal  walk,  brisk  walk  and  running  activities.  For  a  0.05 
false-alarm  probability,  the  copula  based  approach  provides  on  an  average  7%  improvement  in 
detection  rate  over  product  fusion  under  independence  assumption.  The  copula-based 
approach  is  extended  to  a  semi-parametric  framework  [6],  wherein  we  investigate  the  effect  of 
ignoring  the  marginal  distribution  on  detector  performance.  The  marginal  distribution  in  many 
cases  is  difficult  to  model  due  to  non-stationarity  and  temporal  dependence.  Akaike  Information 
Criterion  (AlC)  and  Bayesian  Information  Criterion  (BIC)  are  heuristic  measures  of  information 
that  are  adopted  for  copula  selection.  Detection  performance  is  evaluated  for  the  seismic 
footstep  data.  The  receiver  operating  characteristics  show  that  the  copula  selected  using  either 
AlC  or  BIC  is  the  best  performing  copula,  i.e.,  the  detector  based  on  that  copula  has  the  highest 


probability  of  detection  for  a  fixed  probability  of  false  alarm.  The  copula  selected  using  AlC/BIC 
is  seen  to  give  best  detection  performance. 

(b)  Footstep  detection  for  outdoor  seismic/acoustic  data.  A  copula-based  detector  using  the 
Neyman-Pearson  framework  is  designed.  This  detector  implements  the  sample-wise  copula 
selection  scheme.  We  demonstrate  the  utility  of  our  copula-based  approach  on  simulated  data, 
and  also  for  outdoor  sensor  data  collected  by  the  Army  Research  Laboratory  at  the  US 
southwest  border.  Our  results  show  a  40%  improvement  in  the  probability  of  detection  over 
conventional  fusion  schemes  [5]. 

(c)  Classification  of  heavy-tailed  signals.  The  copula-based  multiinformation  estimate  is  applied  to  a 
classification  problem  with  heavy-tailed  signals.  Our  copula-based  features  give  a  classification 
accuracy  of  up  to  85%  on  test  data  [7].  The  test  data  were  EEC  signals  that  were  used  to  classify 
early  onset  of  Alzheimer  Disease. 

(d)  Biometric  classification.  The  copula  framework  yields  superior  results  when  applied  to  a 
multi-biometric  person  recognition  dataset  (NISTBSSR  1)  [8].  We  use  a  training-testing  paradigm 
for  this  task.  The  copula  parameters  are  learned  during  training,  and  thus  during  testing  we  do 
not  need  to  estimate  the  parameters.  Copula  selection  is  done  offline  and  two  methods  are 
proposed  based  on  (a)  area  under  the  ROC,  and  (b)  area  under  the  probability  of  detection  curve. 
The  latter  provides  reduced  computational  complexity  and  provides  a  systematic  KL-divergence 
based  approach  for  copula  selection. 

3.  Feature  identification  and  neural  network  classification  for  occupancy  classification 

We  have  proposed  a  feature  selection  scheme  for  occupancy  classification  [9].  The  classifier  aims  to 
determine  whether  there  is  exactly  one  occupant  or  more  than  one  occupant.  Data  are  obtained  from 
six  seismic  sensors  (geophones)  that  are  deployed  in  a  typical  building  hallway.  After  exploring  multiple 
alternatives,  we  identified  four  features  as  being  capable  of  assisting  classification  with  a  high  degree  of 
reliability.  The  four  proposed  features  exploit  amplitude  and  temporal  characteristics  of  the  seismic  time 
series.  A  neural  network  classifier  achieves  performance  ranging  from  77%  to  95%  on  the  test  data, 
depending  on  the  type  of  construction  of  the  location  in  the  building  being  monitored. 

4.  Estimation 

Several  estimation  problems  under  various  scenarios  were  investigated. 

(a)  Conditional  Posterior  Cramer-Rao  Lower  Bounds  for  nonlinear  sequential  Bayesian  estimation 
[10].  We  have  proposed  and  developed  a  new  measure  of  online  tracking  performance:  the 
Conditional  Posterior  Cramer-Rao  lower  bound  (CPCRLB).  Posterior  Cramer-Rao  lower  bounds 
(PCRLBs)  for  sequential  Bayesian  estimators  provide  a  performance  bound  for  a  general 
nonlinear  filtering  problem.  However,  the  unconditional  PCRLB  is  an  off-line  bound  whose 
corresponding  Fisher  information  matrix  (FIM)  is  obtained  by  taking  the  expectation  with 
respect  to  all  the  random  variables,  namely  the  measurements  and  the  system  states.  As  a 
result,  the  unconditional  PCRLB  is  not  well  suited  for  adaptive  resource  management  for 


dynamic  systems.  The  new  concept  of  conditional  PCRLB  is  dependent  on  the  actual  observation 
data  up  to  the  current  time,  and  is  implicitly  dependent  on  the  underlying  system  state. 
Therefore,  it  is  adaptive  to  the  particular  realization  of  the  underlying  system  state,  and 
provides  a  more  accurate  and  effective  online  indication  of  the  estimation  performance  than  the 
unconditional  PCRLB.  Both  the  exact  conditional  PCRLB  and  its  recursive  evaluation  approach 
including  an  approximation  are  derived.  A  general  sequential  Monte  Carlo  solution  is  proposed 
to  compute  the  conditional  PCRLB  recursively  for  nonlinear  non-Gaussian  sequential  Bayesian 
estimation  problems.  Simulation  results  show  quick  convergence  for  the  recursive  computation 
oftheCPCRLB. 

(b)  Source  localization  in  wireless  sensor  networks.  A  well-known  method  for  static  source 
localization  is  uses  the  energy  readings  of  all  sensors  in  the  network.  However,  transmitting  all 
sensor  data  to  the  fusion  center  may  introduce  communication  and  energy  overhead.  We 
consider  the  source  localization  problem  in  the  presence  of  networked  sensors.  We  propose  an 
energy  efficient  iterative  localization  scheme,  where  the  algorithm  starts  with  a  coarse  location 
estimate  obtained  from  a  set  of  anchor  sensors  [11].  A  subset  of  the  non-anchor  sensors  is 
selected  and  corresponding  sensors  are  activated  in  each  iteration.  The  observations  from  these 
sensors  are  used  to  minimize  the  Posterior  Cramer-Rao  Lower  Bound  (PCRLB).  Using  the 
available  information  received  at  previous  iterations  as  side  information,  the  quantized  data  of 
each  activated  sensor  is  further  compressed  to  conserve  energy  using  distributed  data 
compression  techniques  prior  to  transmission  to  the  fusion  center.  Simulation  results  show  that 
the  proposed  iterative  method  achieves  the  same  estimation  performance  as  when  all  the 
sensors  transmit  their  quantized  data  to  the  fusion  center  within  a  few  iterations,  while  at  the 
same  time  significantly  reducing  the  communication  requirements  resulting  in  energy  savings. 
For  selecting  sensors  at  each  iteration,  we  have  developed  and  compared  two  sensor  selection 
metrics  based  on  mutual  information  (Ml)  and  PCRLB.  We  show  that  our  PCRLB-based  method 
performs  as  well  as  the  Ml  based  approach,  but  does  so  with  a  significantly  reduced 
computational  burden.  As  a  function  of  the  number  of  selected  sensors,  the  complexity  to 
compute  the  Ml  grows  exponentially,  while  the  complexity  of  PCRLB  computation  increases  only 
linearly.  We  extend  our  work  considering  channel  fading  between  sensors  and  the  fusion  center 
where  we  consider  complete  and  partial  channel  knowledge  at  the  fusion  center  [12]. 
Simulation  results  show  that  partial  channel  knowledge  at  the  fusion  center  achieves  the 
estimation  performance  very  close  to  that  of  having  the  complete  channel  knowledge. 

(c)  Distributed  estimation  under  non-identical  noise  distribution.  We  consider  the  problem  of 
distributed  estimation  with  heterogeneity  in  measurement  error  [13].  The  observation  noise  of 
each  sensor  is  independent  but  not  identically  distributed  and  each  sensor  transmits  different 
amount  of  data  to  the  fusion  center  depending  on  the  quality  of  the  sensor  observation.  We 
show  that  the  complexity  to  compute  average  PCRLB  (A-PCRLB)  is  high.  We  have,  therefore, 
developed  the  inverse  of  the  average  Fisher  information  as  a  lower  bound  on  the  A-PCRLB.  We 
assume  a  constraint  on  the  total  bandwidth.  Each  sensor,  sending  data  at  a  specific  quantization 


rate,  uses  a  certain  transmission  probability  to  send  its  data  to  the  fusion  center.  Under 
stringent  availability  of  bandwidth,  simulation  results  show  that  the  proposed  probabilistic 
transmission  scheme  outperforms  the  scheme  in  terms  of  MSE  where  the  total  bandwidth  is 
equally  distributed  among  sensors. 

(d)  Coalitional  game  for  distributed  estimation.  In  [14],  we  consider  a  collaborative  estimation 
problem  using  dependent  observations  in  a  wireless  sensor  network,  where  each  sensor  aims  to 
maximize  its  estimation  performance  in  terms  of  Fisher  information  (FI)  by  forming  coalitions 
with  other  sensors  and  collaborating  within  a  coalition.  The  energy  consumed  by  the  sensors 
increases  with  the  size  of  the  coalition.  The  distributed  estimation  problem  is  formulated  by 
taking  into  account  this  trade-off  between  minimized  energy  consumption  and  maximized  Fisher 
information,  and  is,  therefore,  cast  in  a  game  theoretic  framework.  We  prove  that  grand 
coalition  will  not  form.  We  investigate  the  formation  of  non-overlapping  coalitions  such  that 
each  sensor's  performance  is  maximized  under  a  specific  energy  constraint.  We  decouple 
marginal  and  dependent  components  of  FI  obtained  from  the  joint  distribution  by  using  copula 
theory.  We  introduce  the  novel  concepts  of  diversity  gain  and  redundancy  loss  and  demonstrate 
how  a  copula-based  formulation  allows  us  to  characterize  these  concepts.  A  merge-and-split 
algorithm  is  used  for  finding  an  optimal  partition. 

5.  Distributed  inference  in  the  presence  of  Byzantine  (unreliable)  sensor  nodes 

Byzantines  are  nodes  are  those  sensor  nodes  that  alter  their  observations,  and  thus  provide  false 
information  to  the  fusion  center.  In  distributed  detection  systems,  where  nodes  make  one  bit  decisions 
regarding  the  presence  of  a  phenomenon  and  collaboratively  make  a  global  decision  at  the  fusion  center 
(FC),  Byzantines  essentially  flip  their  decisions  before  transmission  to  the  FC.  The  performance  of  such 
systems  strongly  depends  on  the  reliability  of  the  nodes  in  the  network.  The  robustness  of  distributed 
detection  systems  against  attacks  is  of  utmost  importance  for  the  functioning  of  distributed  detection 
systems. 

The  problem  of  optimal  distributed  detection  with  independent  identical  sensors  in  the  presence  of 
Byzantine  attacks  is  considered  in  [15].  By  considering  an  attacker  to  be  strategic  in  nature,  we  address 
the  issue  of  designing  the  optimal  fusion  rule  and  the  local  sensor  thresholds  that  minimize  the 
probability  of  error  at  the  fusion  center  (FC).  We  have  addressed  the  problem  of  finding  the  optimal 
fusion  rule  under  the  constraint  of  fixed  local  sensor  thresholds  and  fixed  Byzantine  strategy.  We  have 
also  considered  the  problem  of  joint  optimization  of  the  fusion  rule  and  local  sensor  thresholds  for  a 
fixed  Byzantine  strategy.  These  results  are  extended  to  the  scenario  where  both  the  FC  and  the 
Byzantine  attacker  act  in  a  strategic  manner  to  optimize  their  own  utilities.  We  model  the  strategic 
behavior  of  the  FC  and  the  attacker  using  game  theory  and  show  the  existence  of  Nash  Equilibrium.  We 
observed  that  the  joint  optimization  solution  is  independent  of  the  Byzantine  parameters. 

The  problem  of  covert  attacks  by  Byzantine  nodes  is  addressed  in  [16j.  We  introduce  the  problem 
of  intelligent  data  falsification  attacks  on  distributed  detection  systems.  We  propose  a  scheme  to  detect 
data  falsification  attacks  and  analytically  characterize  its  performance.  We  determine  the  optimal 


attacking  strategy  from  the  point  of  view  of  a  smart  adversary  to  disguise  itself  from  the  proposed 
detection  scheme  while  accomplishing  its  attack. 

In  [17],  an  energy  efficient  localization  scheme  is  proposed  by  modeling  it  as  an  iterative 
classification  problem.  We  designed  coding  based  iterative  approaches  for  target  localization.  The  FC 
iteratively  solves  an  M-ary  hypothesis  test  and  decides  the  Region  of  Interest  (ROI)  for  the  next  iteration. 
We  also  consider  the  presence  of  Byzantine  (malicious)  sensors  in  the  network.  We  investigate  the 
localization  scheme  over  non-ideal  channels  and  propose  the  use  of  soft-decision  decoding  to 
compensate  for  the  loss  due  to  the  presence  of  fading  channels  between  the  local  sensors  and  the  FC. 
We  show  that  the  proposed  soft-decision  decoding  scheme  outperforms  previously  proposed 
hard-decision  decoding  schemes,  both,  in  terms  of  target  detection  probability  as  well  as  the  mean 
square  error  of  location  estimate.  Using  the  soft-decoding  scheme,  we  demonstrate  in  improvement  of 
about  20%  in  detection  probability  and  about  a  50%  reduction  in  the  MSB  for  localization. 

6.  Classification  using  human  workers 

In  [18],  we  consider  the  use  of  error-control  codes  and  decoding  algorithms  to  perform  reliable 
classification  using  unreliable  and  anonymous  human  crowd  workers  by  adapting  coding-theoretic 
techniques  for  the  specific  crowdsourcing  application.  We  develop  an  ordering  principle  for  the  quality 
of  crowds  and  describe  how  system  performance  changes  with  the  quality  of  the  crowd.  We 
demonstrate  the  effectiveness  of  the  proposed  coding  scheme  using  both  simulated  data  and  real 
datasets  from  Amazon  Mechanical  Turk,  a  crowdsourcing  microtask  platform.  Results  suggest  that  good 
codes  may  improve  the  performance  of  the  crowdsourcing  task  over  typical  majority-vote  approaches. 
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